In this practice activity you’ll continue to work with the titanic dataset in ways that flex what you’ve learned about both data wrangling and data visualization.
import pandas as pdimport numpy as npimport plotly.express as pxdata_dir ="https://dlsun.github.io/pods/data/"df_titanic = pd.read_csv(data_dir +"titanic.csv")# Keep only rows that have class & embarked info# (and, if class is missing but pclass exists, construct class)df = df_titanic.copy()if"class"notin df.columns and"pclass"in df.columns: _map = {1: "First", 2: "Second", 3: "Third"} df["class"] = df["pclass"].map(_map)df = df.dropna(subset=["class", "embarked"])df.head()
name
gender
age
class
embarked
country
ticketno
fare
survived
0
Abbing, Mr. Anthony
male
42.0
3rd
S
United States
5547.0
7.11
0
1
Abbott, Mr. Eugene Joseph
male
13.0
3rd
S
United States
2673.0
20.05
0
2
Abbott, Mr. Rossmore Edward
male
16.0
3rd
S
United States
2673.0
20.05
0
3
Abbott, Mrs. Rhoda Mary 'Rosa'
female
39.0
3rd
S
England
2673.0
20.05
1
4
Abelseth, Miss. Karen Marie
female
16.0
3rd
S
Norway
348125.0
7.13
1
1. Filter the data to include passengers only. Calculate the joint distribution (cross-tab) between a passenger’s class and where they embarked.
Most 3rd-class passengers (≈70%) embarked at Southampton, and about 31% of all Southampton passengers were 3rd class. This shows 3rd class mainly boarded at Southampton, while 1st class was more common at Cherbourg.
3. Make a visualization showing the distribution of a passenger’s class, given where they embarked.
Discuss the pros and cons of using this visualization versus the distributions you calculated before, to answer the previous questions.
viz_df = ( cond_class_given_embarked .reset_index() .melt(id_vars="class", var_name="embarked", value_name="proportion"))fig = px.bar( viz_df, x="embarked", y="proportion", color="class", barmode="group", text=viz_df["proportion"].map(lambda x: f"{x:.2f}"))fig.update_layout( title="Distribution of Passenger Class, Given Where They Embarked (P(class | embarked))", yaxis=dict(title="Proportion", tickformat=".0%"), xaxis_title="Embarkation Port")fig.show()
Pros: easy to compare class proportions within each embark point Cons: harder to see that each set sums to 1, but still much clearer visually.